A Robust-Equitable Measure for Feature Ranking and Selection

نویسندگان

  • A. Adam Ding
  • Jennifer G. Dy
  • Yi Li
  • Yale Chang
چکیده

In many applications, not all the features used to represent data samples are important. Often only a few features are relevant for the prediction task. The choice of dependence measures often affect the final result of many feature selection methods. To select features that have complex nonlinear relationships with the response variable, the dependence measure should be equitable, a concept proposed by Reshef et al. (2011); that is, the dependence measure treats linear and nonlinear relationships equally. Recently, Kinney and Atwal (2014) gave a mathematical definition of self-equitability. In this paper, we introduce a new concept of robust-equitability and identify a robust-equitable copula dependence measure, the robust copula dependence (RCD) measure. RCD is based on the L1-distance of the copula density from uniform and we show that it is equitable under both equitability definitions. We also prove theoretically that RCD is much easier to estimate than mutual information. Because of these theoretical properties, the RCD measure has the following advantages compared to existing dependence measures: it is robust to different relationship forms and robust to unequal sample sizes of different features. Experiments on both synthetic and real-world data sets confirm the theoretical analysis, and illustrate the advantage of using the dependence measure RCD for feature selection.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Robust-Equitable Copula Dependence Measure for Feature Selection

Feature selection aims to select relevant features to improve the performance of predictors. Many feature selection methods depend on the choice of dependence measures. To select features that have complex nonlinear relationships with the response variable, the dependence measure should be equitable; i.e., it should treat linear and nonlinear relationships equally. In this paper, we introduce t...

متن کامل

Hierarchical Group Compromise Ranking Methodology Based on Euclidean–Hausdorff Distance Measure Under Uncertainty: An Application to Facility Location Selection Problem

Proposing a hierarchical group compromise method can be regarded as a one of major multi-attributes decision-making tool that can be introduced to rank the possible alternatives among conflict criteria. Decision makers’ (DMs’) judgments are considered as imprecise or fuzzy in complex and hesitant situations. In the group decision making, an aggregation of DMs’ judgments and fuzzy group compromi...

متن کامل

Feature Selection Using Multi Objective Genetic Algorithm with Support Vector Machine

Different approaches have been proposed for feature selection to obtain suitable features subset among all features. These methods search feature space for feature subsets which satisfies some criteria or optimizes several objective functions. The objective functions are divided into two main groups: filter and wrapper methods.  In filter methods, features subsets are selected due to some measu...

متن کامل

Ensemble Classification and Extended Feature Selection for Credit Card Fraud Detection

Due to the rise of technology, the possibility of fraud in different areas such as banking has been increased. Credit card fraud is a crucial problem in banking and its danger is over increasing. This paper proposes an advanced data mining method, considering both feature selection and decision cost for accuracy enhancement of credit card fraud detection. After selecting the best and most effec...

متن کامل

IFSB-ReliefF: A New Instance and Feature Selection Algorithm Based on ReliefF

Increasing the use of Internet and some phenomena such as sensor networks has led to an unnecessary increasing the volume of information. Though it has many benefits, it causes problems such as storage space requirements and better processors, as well as data refinement to remove unnecessary data. Data reduction methods provide ways to select useful data from a large amount of duplicate, incomp...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Journal of Machine Learning Research

دوره 18  شماره 

صفحات  -

تاریخ انتشار 2017